A Logic Grammar Foundation for Document Representation and Document Layout
نویسندگان
چکیده
We present a powerful grammar-based paradigm for electronic document markup: coordinated definite clause translation grammars. This markup is of a declarative character, being, in effect, a collection of constraints on the logical and physical structure of documents. To the best of our knowledge, coordinated grammars and their parsers can accommodate all of the descriptive and layout processing functionality enjoyed by extant electronic markup languages. We describe an operational prototype that demonstrates the feasibility of a syntax-directed basis for formalizing and realizing
منابع مشابه
A New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملمطالعه تطبیقی نمایندگی در امضای اسناد تجاری (برات، سفته و چک)
According to the Article 227 of Commerce Law and Article 19 drawing cheque in respective with appointing a representation for issuance of draft and cheque, the following questions have always been present: a) whether this representation exists only at the time of signing a document or it will be present at other stages such as endorsement and assurance, too? b) Whether the responsibility of s...
متن کاملGraph Grammar Based Analysis System of Complex Table Form Document
Structure analysis of table form document is important because printed documents and also electronical documents only provide geometrical layout and lexical information explicitly. To handle these documents automatically, logical structure information is necessary. In this paper, we first propose a general representation of table form document based on XML, which contains both structure and lay...
متن کاملThe Representation of Social Actors in the Graduate Employability Issue: Online News and the Government Document
This paper presents the first part of a larger study on the issue of graduate employability in Malaysia as construed in public discourse in English, a language of power in Malaysia. The term employability itself has many definitions depending on the requirements of government and industry, and in the case of Malaysia, the English-language ability of graduates is inseparable from graduate employ...
متن کامل